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RELATED APPLICATION 
Other applications of particular interest to and commonly 
assigned with the present application include: SC/Serial No. 
09/266,558 entitled METHOD AND APPARATUS FOR INTERACTIVE 
SIMILARITY SEARCHING, RETRIEVAL, AND BROWSING OF VIDEO 
filed March 11, 1999 by Jonathan T. Foote, Lynn D. Wilcox and 
Andreas Girgensohn. 

Field of the Invention 

The present invention relates to obtaining information, and in 
particular linking types of information. 

Background 

Events, such as meetings, may be recorded to save important 
information. Often, a video of the meeting may contain important 
information which may be retrieved by an individual who may not have 
been able to attend. During the meeting, a presenter or participant 
may have a paper handout in order to enhance their presentation. The 
discussion pertaining to a particular handout or slide may be a 
significant aspect of the meeting. A subsequent viewer of the video 
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may wish to view only a segment of the video pertaining to a 
particular paper handout. 

Paper handouts are widely used and individuals feel comfortable 
interacting with them during a meeting. Individuals can make notes 
5 and annotations on the handouts without consciously thinking about 

how to do it. This is not the case with most electronic documents and 
devices, especially for a gathering of meeting participants who may 
have not been trained in the technology. The reason is that during a 
meeting when people must pay attention and participate, only the 
10 simplest technologies having an unobtrusive form factor and an 

g undemanding user interface are usable. 

;i The Cornell Lecture Browser allows for a user to view a segment 

© of a video pertaining to a digital document in a specified format. The 

Rj Cornell Lecture Browser matches the specified digital document to a 

J 15 section of a videotaped event. However, in certain scenarios, a digital 

L. pre-formatted file of the paper handout may not be available in all 

Ul events. A paper handout may be the only document available at the 

|M meeting. Further, creating a specified digital form of a paper handout 

!;S may require special equipment or knowledge not available to a 

20 participant. 

In contrast, a paper handout may be easily scanned to create a 
digital file. Scanning equipment is relatively inexpensive and easy to 
operate. Nevertheless, in scanning the paper handout and/or slide, a 
number of problems may be encountered which do not have to be 
25 overcome in using the Cornell Lecture Browser. First, the scanned 

document may have substantial margins which would hinder matching 
of the scanned paper handout to a segment of the video tape. 
Second, the scanned document may be slightly rotated during 
scanning resulting in a skewed scanned document which may hinder 
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matching of the scanned document to a segment of the video. Third, 
the paper handout may be in monochrome while the video may be 
videotaped in color further hindering a match between the scanned 
document and a segment of the video. Fourth, the paper handout may 
5 be scanned with handwritten annotations further complicating the 

matching of the scanned document and a segment of the video. 

Therefore, it is desirable to provide a method, system and article 
of manufacture containing software which links a scanned document 
to a segment of a video. The method, system and article of 
10 manufacture should allow for linking a scanned document having 

substantial margins and rotated during scanning. The method, system 
2 and article of manufacture should be able to link a monochrome 

0 scanned document to a color video, or vice versa. 

P 15 SUMMARY OF INVENTION 

L. According to an embodiment of the present invention, a method 

J1 for linking a scanned document to a segment of a video is provided. 

11 The margins of a scanned document are removed and the document 

;Sj is scaled. The scanned document is transformed into a scanned 

20 document identifier. A video file having a plurality of video frames is 

then obtained. A plurality of video frames is transformed into a 
plurality of respective video frame identifiers. The scanned document 
identifier is then compared to the plurality of video frame identifiers. 
The scanned document is linked to a first video frame in the plurality 
25 of video frames responsive to the comparison step. 

According to an embodiment of the present invention, the 
* transforming the scanned document step includes using an 

orthonormal transform. 
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According to yet another embodiment of the pr sent invention, 
the orthonormal transform is a discrete cosine transform. 

According to still another embodiment of the present invention, 
the transforming step includes removing the least significant 
5 information. 

According to another embodiment of the present invention, the 
transforming the plurality of video frames includes using an 
orthonormal transform, such as a discrete cosine transform. 

According to another embodiment of the present invention, the 
1 0 comparing step includes comparing color information from the scanned 

f% document to color information from the plurality of video frames. 

-Mas- 

According to another embodiment of the present invention, the 
linking step includes linking the scanned document to a first and 
fy a second frame in the plurality of video frames. 

Ii5 1 5 According to another embodiment of the present invention, the 

L scanned document includes handwritten annotations. 

:r — j: 

In According to another embodiment of the present invention, an 

£i information system for linking a scanned document to a segment of a 

!=5 video is provided. The information system comprises a first processing 

20 device coupled to a persistent storage device. The persistent storage 

device stores linking software which creates a link between a scanned 
document file and a segment of a video file responsive to a 
comparison of the transformed scanned document and a transformed 
video frame. 

25 According to another embodiment of the present invention, the 

information system further comprises a scanner coupled to the first 
processing device. The scanner creates the scanned document or 
digital file from a physical document. 
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According to another embodiment of the present invention, a 
camera is coupled to the first processing device and is used for 
recording the video file. 

According to another embodiment of the present invention, a 
second processing device is coupled to the first processing device and 
is used for viewing the segment of the video responsive to a selection 
of the scanned document. 

According to still another embodiment of the present invention, 
an article in manufacture including a computer-readable memory 
having a first software program for obtaining the scanned document 
is provided. The article in manufacture also includes a second 
software program for obtaining a video file and a third software 
program for linking the scanned document to a segment of the video 
file. 

According to another embodiment of the present invention, a 
third software program includes an orthonormal transform and a 
scaling software program. 

According to another embodiment of the present invention, the 
third software program includes a software program for removing the 
least significant information and removing margins of the scanned 
document. 

According to another embodiment of the present invention, 
a fourth software program removes annotations from the scanned 
document. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 illustrates a video frame displayed in a browser according 
to an embodiment of the present invention; 
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Fig. 2 illustrat s a video of a meeting using the scanned 
document illustrated in Fig. 1 according to an embodiment of the 
present invention; 

Fig. 3 illustrates viewing a scanned document referencing a 
segment of a video according to an embodiment of the present 
invention; 



Figs. 4a-c illustrate a method for linking a scanned document to 
10 a segment of a video according to an embodiment of the present 

H invention; and 

03 Fig. 5 illustrates a simplified hardware and software block 

pj diagram according to an embodiment of the present invention. 

IS 15 

% DETAILED DESCRIPTION 

in Figs. 1 and 5 illustrate browser interface 100a for viewing a 

[[I scanned document 101a used in a recorded event according to an 

embodiment of the present invention. As will be described in detail 
20 below, browser 100 and linking software 800 enables a user on the 

Internet 880 to view a segment of a recorded event 50, such as a 
meeting, which references scanned document 101a. 

In an embodiment of the present invention, user interface 100a 
includes pull-down menus 103 which include File, Edit, View, and Help 
25 functions. Controls 102 allow for a user to control playing a video. 

In an embodiment of the present invention, controls 1 02 allow for user 
to control the playing of video file 200 which is a video recording of 
event 50 by camera 851 as illustrated in Fig. 5. Window 203 allows 
for the display of a scanned document and for the playing back of 
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recorded event 50, Bar chart 104 illustrates the fr quency and 
durations that a particular scanned document is used in event 50. 

Fig. 2 illustrates browser interface 100a having window 203 
used to display a video file 200, in particular, a video frame 200a of 
5 video file 200. In the video frame 200a, document projection 101b is 

recorded and shown in window 203. Window 105 displays the 
relative time in which video frame 200a is displayed during the course 
of event 50. Similarly, bar 104a represents the time and duration in 
which document 101 is presented as projection 101b in window 203. 
10 Fig. 3 illustrates browser interface 300 for accessing a segment 

;f~l of video file 200 which references scanned document 101a. Similar 

*y to Fig. 1, browser interface 300 includes window 302 for viewing 

[ M scanned documents 301 and 101a. In an embodiment of the present 

pj invention, a user may double-click on the scanned document 301 or 

if^ 15 101a and browser interface 100a will play in window 203 the 

corresponding recorded segment of event 50 referencing the scanned 
01 documents. In an alternate embodiment of the present invention, 

iilr scanned documents 301 and 101a may be represented by hyperlinked 

y universal resource location ("URL") addresses. 

20 Linking software 800 as illustrated in Fig. 5 links a scanned 

document to a corresponding segment of video file in which the 
scanned document is referenced. Embodiments of the present 
invention allow for a user to view significant segments of a recorded 
event without having to view the entire recording. Thus, large 
25 amounts of time and resources may be saved by viewing only 

significant segments of event 50. Moreover, embodiments of the 
present invention allow for a user to use relatively inexpensive and 
user-friendly scanning equipment in linking the segment of the 
recording to the scanned document. 
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Figs. 4a-c illustrate method 200 for linking a scanned document 
to a segment of a recorded event. As one who is skilled in the art 
would appreciate, Figs. 4a-c illustrate logic blocks for performing 
specific functions. In alternate embodiments, more or fewer logic 
5 blocks may be used. In an embodiment of the present invention, logic 

blocks may represent software programs, software objects, software 
subroutines, code fragments, hardware operations, user operations, 
singly or in combination. 

Method 200 initiates by digitally scanning a document 101, 
10 as illustrated by logic block 201 , to create scanned document 101a or 

IP a digital representation of document 101 . In an event, document 101 

is a physical paper handout. In an alternate embodiment, document 
® 101 has handwritten annotations or is a slide for a projection. The 

fu margins are then removed from the scanned document as illustrated 

|5 15 by logic block 202. Black margins caused by scan converters and 

white margins in paper handouts have significant effects in 
HI transforming the document as described below. This problem is 

LL solved by reducing the bounding box of an image until all the margins 

;S having a uniform color are removed. It may be necessary to remove 

20 several layers of uniform margins in cases where a slide with a 

uniform, non-white background is printed on a white sheet of paper 
and compared to an image without any artificial margins. 

The scanned document is then scaled as illustrated by logic 
block 203. In an embodiment of the present invention, the document 
25 is scaled to approximately 64 x 64 pixels. 

An orthonormal transform is then performed on the document 
as illustrated by logic block 204. The transform is applied to the entire 
scanned document 101a rather than small sub-blocks as is common 
in image compression. In a preferred embodiment of the present 
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invention, a discrete cosine transform (DCT) is performed on scanned 
document 101a to create a document identifier or a series of DCT 
coefficients. 

The transform can be performed on grayscale images, or color 
5 information can be captured by calculating one or more additional 

signatures based on the color information. This is accomplished by 
computing an additional signature for the chromatic components of the 
scanned document (the UV components in the YUV color space) to 
add to the existing luminance (Y) signature. Because the chromatic 
10 components need less spatial resolution, they could be represented 

with smaller signatures. 

Alternatively, each YUV or RGB color component could be 
treated as a separate image. Thus three signatures would be 
calculated and compared for each scanned document. This would 
p 1 5 allow weighting by overall color in the similarity metric. 

JL- Yet another embodiment of using color information is to 

VI combine obtaining a luminance (Y) signature with obtaining color 

histograms. In the first step, scanned images would be found by 
!S luminance signature similarity. The top-ranking scanned images could 

20 be examined using color-histogram similarity or a similar approach. 

While color histograms are even less sensitive than DCT 
coefficients to image translation and rotation, they are not suitable as 
the only means for finding matches between a scanned document and 
a video file of the document. Two scanned documents with the same 
25 background and the same amount of text would produce the same 

color histograms even if the text is different. When using DCT 
coefficients, changes in the text distribution on the page can be 
recognized with a few low-frequency coefficients. For distinguishing 
different words of the same length in the same position, more high- 
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frequency coefficients are ne ded. A proper balance between word- 
sensitivity and filtering out annotations is necessary. Nevertheless, 
color information or signatures may be used as a part of an image 
identifier. 

The transformed data is then reduced by discarding the least 
significant information. In an embodiment, the scanned document 
coefficients are truncated as illustrated by logic block 205. In an 
embodiment, principal component analysis (PCA) or linear discriminant 
analysis (LDA) may be used to discard least significant information. 
In the preferred embodiment, selecting coefficients having the highest 
variation is preferred. In an embodiment of the present invention, the 
256 lowest frequency DCT coefficients are kept using a low-pass 
filter. Truncated DCT coefficients are insensitive to small differences 
in image translation, rotation, or scale as can happen while scanning 
paper handouts. 

The scanned document coefficients are then normalized as 
illustrated by logic block 206. 

The document coefficients are then saved as illustrated by logic 
block 207. 

A decision is then made whether or not there is a need to scan 
another document as illustrated by logic block 208. If another 
document needs to be scanned, logic blocks 201-207 are repeated. 
Otherwise, control transitions to logic block 209. 

A digital video file is then obtained as illustrated by logic block 
209. In an embodiment of the present invention, a camera 851 is 
positioned in a room in which a meeting is occurring so that the 
meeting may be recorded as video file 200. A video frame from video 
file 200 is then retrieved as illustrated by logic block 210. In an 
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embodiment of the present invention, every sixth video frame is 
obtained. 

The video from margins are removed as illustrated by logic block 
210a and described above. 

The video frame is then scaled as illustrated by logic block 21 Ob 
and described above. 

An orthonormal transform is then performed on the video frame, 
as illustrated by logic block 21 1, to provide a video frame identifier. 
In an embodiment of the present invention, the orthonormal transform 
is a discrete cosine transform as described above. The video frame 
coefficients are then truncated by logic block 212 and as described 
above. In an embodiment of the present invention, the 256 lowest 
frequency DCT coefficients are obtained. 

The DCT coefficients are then normalized as illustrated by logic 
block 213. 

The video frame coefficients are then saved as illustrated by 
logic block 214. 

A determination is made whether there are any remaining video 
frames as illustrated by logic block 215. If there are remaining video 
frames, logic transitions back to logic block 210 and logic blocks 
210-214 are repeated. Otherwise, logic control transitions to logic 
block 216a. 

Saved document coefficients from a scanned document are 
retrieved as illustrated by logic block 216a. 

Saved frame coefficients from a frame of the video are then 
retrieved as illustrated by logic block 216b. 

A calculation is then performed between the retrieved DCT 
coefficients representing the scanned document 101a and the DCT 
coefficients of a video frame as illustrated by logic block 216. The 
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similarity between any particular video frame and the scanned image 
can be calculated by measuring the similarity between the coefficients. 
In an embodiment of the present invention, a Euclidean distance 
between document coefficients and video frame coefficients is 
5 calculated. This measure can be usefully weighted to increase 

performance; for example the DC component can be weighted less so 
the method depends less on overall image brightness. 

According to an embodiment of the present invention, a decision 
is made whether the Euclidean distance is less than a predetermined 

10 threshold as illustrated by logic block 217. If the distance is not less 

than the predetermined threshold, logic transitions back to logic block 
216b. Otherwise, logic transitions to logic block 218. 

According to another embodiment of the present invention, logic 
block 217 is replaced with determining a minimum number of 

15 Euclidean distances between document coefficients and video frame 

coefficients. For example, the five least Euclidean distances are used 
as matches. 

The scanned digital document is then linked or indexed to the 
video frame as illustrated by logic block 218. A determination is made 

20 whether another scanned document coefficients are retrieved in logic 

block 219. If there is another scanned document, logic transitions to 
logic block 216a where logic blocks 216a-218 are repeated. 
Otherwise, the method exists. 

A paper handout may be referenced multiple times in event 50. 

25 For example, during a question and answer period, the speaker may go 

back to a paper handout referred to in a question. To handle this, 
method 200, as described above, makes a complete pass through 
video file 200 (rather than stopping at the first match), and determines 
a sequence of appearances plus the duration of each appearance. 
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In the user interface, this information on the number of 
appearances and the duration of each appearance can be shown using 
"temporal icons." Temporal icons are clock icons showing a time 
representing the amount of time the paper handout was referenced. 
The icon also has a ring around the bezel (similar to a diver's watch) 
with an arc showing the duration of the appearance. Furthermore, the 
color of the arc and/or the clock face changes colors to indicate 
whether this segment of the event has been visited or partially visited 
during playback. 

Paper handouts are very easy to annotate. The ink annotations 
can be ignored for the image matching described above. The reason 
is that ink strokes, being thin lines, will just fall through because they 
only contribute high frequency coefficients which are discarded. 

To extract the ink annotations, a simple comparison between 
the original and the annotated paper handout may be performed. 
When more than one set of handouts have been annotated, these ink 
strokes may be extracted and selectively layered over the common 
background of the scanned document. Another way to display the ink 
annotations and notes is simply to show them without a background. 
In any case, the ink strokes may be hyperlinked to play their 
corresponding segment in the video recording. 

Fig. 5 shows hardware and software components of an 
exemplary information system suitable for linking scanned documents 
to segments of recorded events, according to an embodiment of the 
present invention. System 799 of Fig. 5 includes a processing device 
800 connected by one or more communication pathways, such as 
connection 829, to a local-area network (LAN) 840 and also to a 
wide-area network, here illustrated as the Internet 880. Through LAN 
840, processing device 800 can communicate with other processing 
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devices, such as scanner 841 and camera 851. In an alternate 
embodiment scanner 841 and camera 851 are not physically 
connected to LAN 840. Through the Internet 880, processing device 
800 can communicate with other processing devices, both locally and 
remote, such as web client 881. As will be appreciated, the 
connection from processing device 800 to Internet 880 can be made 
in various ways, e.g., directly via connection 829 (wired or wireless), 
or through local-area network 840, or by modem (not shown). 

Processing device 800 in an embodiment of the present 
invention is a personal or office computer. An exemplary embodiment 
uses a Dell® Dimension® XPS B Series desktop computer (Dell 
Computer Company, Round Rock, TX). In an alternate embodiment, 
processing device 800 is a personal digital assistant, hand-held 
computer, "Smart" telephone, information appliance, or an equivalent 
thereof. For purposes of exposition, processing device 800 can be 
conveniently divided into hardware components 801 and software 
components 802; however, persons of skill in the art will appreciate 
that this division is conceptual and somewhat arbitrary, and that the 
line between hardware and software is not a hard and fast one. 
Further, it will be appreciated that the line between a host computer 
and its attached peripherals is not a hard and fast one, and that in 
particular, components that are considered peripherals of some 
computers are considered integral parts of other computers. 

Hardware components 801 include a processor (CPU) 805, 
memory 806, persistent storage 808, user I/O 820, and network 
interface 825. These components are well understood by those of skill 
in the art and, accordingly, need be explained only briefly here. 

Processor 805 can be, for example, a microprocessor or a 
collection of microprocessors configured for multiprocessing. It will be 
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appreciated that the role of processing device 800 can be taken in 
some embodiments by multiple computers acting together (distributed 
computation); in such embodiments, the functionality of processing 
device 800 in the system of Fig. 5 is taken on by the combination of 
these processing devices, and the processing capabilities of processor 
805 are provided by the combined processors of the multiple 
computers. 

Memory 806 can include a computer readable medium such as 
read-only memory (ROM), random-access memory (RAM), virtual 
memory, or other memory technologies, singly or in combination. 
Persistent storage 808 can include a computer readable medium, for 
example, a magnetic hard disk, a floppy disk, or other persistent read- 
write data storage technologies, singly or in combination. It can 
further include mass or archival storage, such as can be provided by 
CD-ROM or other large-capacity storage technology. (Note that web 
client 881 may be have a similar software and hardware components.) 
In an embodiment, persistent storage 808 stores a scanned digital 
document 101a and video file 200. 

User I/O (input/output) hardware 820 typically includes a visual 
display monitor such as a CRT or flat-panel display, an alphanumeric 
keyboard, and a mouse or other pointing device, and optionally can 
further include a printer, an optical scanner, or other devices for user 
input and output. In an embodiment, user I/O 820 is used to select 
the playback of a segment of video file 200 corresponding to a 
scanned digital document 101a. 

Network I/O hardware 825 provides an interface between 
processing device 800 and the outside world. More specifically, 
network I/O 825 lets processor 805 communicate via connection 829 
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with other processors and devices through LAN 840 and through the 
Internet 880. 

Software components 802 include an operating system 900 and 
a set of tasks under control of operating system 900. As known by 
5 one of ordinary skill in the art, operating system 900 also allows 

processor 805 to control various devices such as persistent storage 
808, user I/O 820, and network interface 825. Processor 805 
executes the software of operating system 900 and its tasks in 
conjunction with memory 806 and other components of computer 
10 system 800. 

In an embodiment, software 802 includes browser 100 and 
jiff video player 901 for playing video file 200. In an embodiment, video 

m player 901 is a Moving Picture Experts Group (MPEG) player or 

m RealVideo player. In an embodiment of the present invention, browser 

1 5 300 may be a Netscape 6.0 browser provided by Netscape 

* Communications Corporation located in Mountain View, California. 

ItR In an embodiment of the present invention, linking software 808 

H is stored on a computer-readable medium such as a magnetic hard 

y disk, floppy disk, CD-ROM, or other writeable data storage 

20 technologies, singly or in combination. 

Persons of skill in the art will appreciate that the systems of 
Fig. 5 are intended to be illustrative, not restrictive, and that a wide 
variety of computational, communications, and information and 
document processing devices can be used in place of or in addition to 
25 what is shown in Fig. 5. For example, connections through the 

Internet 880 generally involve packet switching by intermediate router 
computers (not shown), and processing device 800 is likely to access 
any number of processing devices, including but by no means limited 
to scanner 841 and camera 851. 
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The foregoing description of the preferred embodiments of the 
present invention has been provided for the purposes of illustration 
and description. It is not intended to be exhaustive or to limit the 
invention to the precise forms disclosed. Obviously, many 
modifications and variations will be apparent to practitioners skilled in 
the art. The embodiments were chosen and described in order to best 
explain the principles of the invention and its practical applications, 
thereby enabling others skilled in the art to understand the invention 
for various embodiments and with the various modifications as are 
suited to the particular use contemplated. It is intended that the scope 
of the invention be defined by the following claims and their 
equivalents: 
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